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ABSTRACT 

Persons who engage in non-suicidal self-injury (NSSI), of- 
ten conceal their practices which limits the examination and 
understanding of those who engage in NSSI. The goal of this 
research is to utilize public online social networks (namely, 
in LiveJournal, a major blogging network) to observe the 
NSSI populations communication in a naturally occurring 
setting. Specifically, LiveJournal users can publicly declare 
their interests. We collected the self-declared interests of 
22,000 users who are members of or participate in 43 NSSI- 
related communities. We extracted a bimodal socio-semantic 
network of users and interests based on their similarity. The 
semantic subnetwork of interests contains NSSI terms (such 
as "self-injury" and "razors"), references to music perform- 
ers (such as "Nine Inch Nails"), and general daily life and 
creativity related terms (such as "poetry" and "boys"). As- 
suming users are genuine in their declarations, the words 
reveal distinct patterns of interest and may signal keys to 
NSSI. 
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INTRODUCTION 

Non-suicidal self-injury (NSSI) is the direct, deliberate de- 
struction of one's own body tissue in the absence of suici- 
dal intent |TT). It is practiced primarily by adolescents and 
young adults [4| and is often concealed from others. Com- 
mon NSSI activities include skin cutting, banging or hitting 
oneself, and burns . 
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Recent prevalence estimates suggest that 14% to 21% of 
adolescents and 17% to 25% of young adults have engaged 
in NSSI at some point in their lives j5] [12). Furthermore, 
NSSI is repeatedly found to be associated with significant 
emotional and behavioral dysfunction (e.g., eating disorders, 
suicide iflOl ). These findings highlight the need to enhance 
understanding and prevention of NSSI and its psychiatric se- 
qualea. 

The goal of this research is to find mechanisms that could 
identify NSSI persons by automatically analyzing secondary 
data publicly available from massive online social networks 
(MOSN), without explicitly interacting with the subjects. 
Many popular MOSNs (e.g., Facebook and LiveJournal) al- 
low users to declare their interests, either explicitly or in the 
form of "likes." While these interests are often selected ran- 
domly and polluted with "status words," we found a very 
significant correlation between interest lists and membership 
in NSSI online communities in at least one major MOSN — 
LiveJournal ]9][15], a blogging social network. 

This association between interest lists and NSSI commu- 
nity membership suggests that "likes" or interest lists may 
be serving as identity signals "communicating aspects of in- 
dividuals (e.g., group membership or other preferences) to 
others in the social world" JSJ. Such identity signals gain 
greater meaning (i.e., signal value) as their association with 
group membership strengthens. From an identity-signaling 
perspective, identity signals with greater signal value can 
influence others, particularly others who aspire for group 
membership, to adopt behaviors characteristic of the larger 
group Q. 

To investigate the value of interest lists generated by mem- 
bers of NSSI online communities in LiveJournal, we used 
the declared interests as nodes and similarities between their 
users as edges to build a semantic network. The layout of the 
network consists of four clearly separated word clusters, one 
of which corresponds to the pathological terms (e.g. "self- 
injury" and "razor") and the other three refer to daily life, 
popular music, and creativity. We expect that the bridge 
terms that connect the pathology cluster with the remaining 
three clusters can be used as beacons signaling the potential 
presence of an NSSI behavior. 
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The rest of the paper is organized as follows: in Section 1, 
we describe the data acquisition process; Sections 2 and 3 



explain the semantic network generation and the resulting 
network organization; network comparative assessment is 
presented in Section 4; in Section 5, we conclude and outline 
our future research plans. 

DATA COLLECTION 

Our analysis is based on the data set collected from Live- 
Journal — a popular massive online blogging social network 
site (BSN). A BSN allows individual bloggers to form con- 
tact lists, subscribe to their friends' blogs, comment on se- 
lected blog posts, declare interests, and participate in com- 
munities — collective blogs. Thus, a blogging network is a 
bimodal venue where users engage in both publishing and 
social activities |16|. As of Spring 2012, LiveJoumal has 
32 mln individual and community accounts. A LiveJoumal 
user maintains his/her personal blog (public or private) and 
may be a member of an unlimited number of special- and 
general-interest communities. 

We identified 43 NSSI-related communities in LiveJournafl 
Users are associated with the communities either explicitly 
(by membership) or implicitly (by posting to the community 
blogs without becoming formal members, where permitted). 
Some of these communities promote NSSI activities, while 
others advocate for NSSI abstinence. 

We collected all self-declared interests of the 22,000 Live- 
Journal users associated with the selected communities (by 
membership or by posting, as described above). The total 
number of harvested interests is ^150,000, including mis- 
spelled, abbreviated, and hyphenated variants. 

Thus, we formed a matrix M where My = 1 iff the user 
Ui has declared the interest Vj, and My = 0, otherwise. 
In other words, M is the adjacency matrix of a two-mode 
network of users and their interests. 

SEMANTIC NETWORK GENERATION 

We use the matrix M to generate a semantic network lf]~3l 
of interests corresponding to the NSSI population. This net- 
work is a one-mode projection of the original two-mode net- 
work induced by the matrix M. It is undirected, weighted, 
and signed. The nodes of the network represent interests Ii 
and the edges represent the corresponding general similari- 
ties dj e [-1,1]. 

Thematic (e.g., NSSI-related) communities are more homo- 
geneous than general-interest communities. They consist of 
people who are similar in a certain sense. In an extreme case, 
all community members would be uniformly interested in 
the community subject and use common terminology. This 
similarity should be taken into consideration while calcu- 
lating correlations between declared interests. It has been 
shown by Kovacs [ 6 ] and confirmed by our finding that agent 
agnostic Pearson correlation underestimates the proximity of 
terms. Kovacs generalized similarity measures take the pop- 
ulation structure into account. They are defined recursively: 

'A complete list of communities with their posting and member- 
ship statistics, etc. is available from the authors in electronic form 
by email. 



two terms are similar with correlation 9y if they are used by 
similar people; two people are similar with correlation <I>y 
if they use similar terms (9y, <I>y £ [— 1, 1]). 



M.j be the z'th row 



Let M{ = M i; . - M^. and M 3 = My ■ 
or the j'th column of the matrix M, respectfully, centered 
by subtracting the mean of the corresponding row or col- 
umn. Then matrices 9 and $ can be calculated recursively 
by starting with two appropriately sized identity matrices /: 

e = 7,$ = /, 



6y, fc+1 = M&kMf / J (M&kM?) (Mi$ k Mi T ), 



$y- fc+1 = MfQ k Mj /yJ(MTe k Mi) {M3 T Q k Mi ) . 

After a number of iterations the algorithm converges to the 
"true" values of <E> ss and 9 « &oo. The similarities 
between community members though calculated, are not 
used in this study. 

As a side note, in the case of totally heterogeneous popu- 
lation, 9 = £ and $ = I (each person is similar only to 
herself). 

By construction, 9 is a dense symmetric signed square ma- 
trix with few or no zero terms. The distribution of similarity 
measures in the matrix is close to uniform. The similarities 
in the matrix are sustained by the whole body of interests and 
are robust against random variations of individually declared 
interests. 

Calculating 9 for 150,000 interests is computationally in- 
feasible due to time constraints and arithmetic imprecision. 
We restricted our study to the top 600 most often declared 
interests shared by ^14,000 NSSI bloggers. That was the 
largest matrix that could be evaluated on a 64-bit AMD desk- 
top computer with 8GB of RAM. 

SEMANTIC NETWORK ANALYSIS 

To explore the organization of the semantic network of in- 
terests, we extracted some of the strongest generalized sim- 
ilarities between the interests by creating another adjacency 
matrix ']/: 



if 9y > 0.8 
else 



(1) 



Matrix ^ is square, sparse (its density is 12%), symmetric, 
undirected, weighted (in a limited range), and unsigned. It 
has 42,000 non-zero entries that correspond to 21,000 net- 
work edges. 

We used program Gephi [1| to visualize the network de- 
scribed by matrix 'J. The sketch of the network is shown 
in Figure QflJ The network has a clear hierarchical structure. 
It consists of four major clusters: "music" (MUS), "pathol- 
ogy" (PAT), "daily life/emotions" (DLE), and "creativity" 
(CRE). Some most frequently declared interests from each 
of the clusters are shown below: 



"The detailed network map is available from the authors in elec- 
tronic form by email. 




Figure 1. Semantic network of interests in the NSSI-related communi- 
ties 

MUS: atreyu, him, incubus, korn, my chemical romance, 
nirvana, rancid, system of a down, the perfect circle; 

PAT: alcohol, anorexia, bulimia, burning, cutting, handcuffs, 
pain, self-injury and self-mutilation (both with and with- 
out the dash), spikes, weeds; 

DLE: cameras, cloths, dvds, flirting, flowers, fun, quotes, 
smiling, hearts (also as an HTML entity ♥ and as 

<?); 

CRE: astrology, books, languages, philosophy, psychology, 
Shakespeare, sociology, travel, wine. 

There is surprisingly little connectivity between the clusters 
CRE and MUS. The remaining border zones are spanned 
with few important bridge interests: 

PAT/MUS: (black) eyeliner, girl interrupted, metal; 

PAT/DLE: candy, girls, insomnia, red, rock music, sex; 

MUS/DLE: animals, camping, fashion, games, honesty, hu- 
mor, travelling; 

All four clusters: bands, bracelets, hoodies, lesbians, mak- 
ing out. 

Since, from the point of view of the NSSI users, the bridge 
terms are similar both to pathological and non-pathological 
terms, their occurrence in a text may be a signal of a poten- 
tially NSSI behavior. 

The presence of the tightly interwoven MUS component is 
equally surprising, given that music is not an explicit topic 
in any of the NSSI communities. 



SEMANTIC NETWORK COMPARATIVE ASSESSMENT 

While many of the associations shown in the map in FigureQ] 
may be specific to the NSSI community members, some may 
be either totally random or specific to the age or cultural 
group to which these members belong. Thus, some asso- 
ciations that seemingly suggest NSSI behavior, may turn out 
to be misleading. 

To identify NSSI-specific information, we attempted to com- 
pare the NSSI communities to a random sample of LiveJour- 
nal users whose declared age distribution was similar to that 
of NSSI users. The members of the chosen sample seem 
to share very few interests with the NSSI population under 
study. This is not surprising, given that the ^32 mln Live- 
Journal users belong to different ethnic and cultural groups 
and, when chosen at random, are unlikely to share interests 
expressed by a small number of words form a very limited 
vocabulary. 

Next we tried to find appropriate communities that may share 
interests with the NSSI population even given a very lim- 
ited power of expression. After some research we identified 
two LiveJournal communities that have demographics simi- 
lar to the NSSI communities and focus on non-pathological 
topics: "sexy-mood-music" (SMM, 6,700 members, aver- 
age age 25 years) and "movies-in-fifteen-minutes" (M15M, 
13,300 members, average age 28 years). These communities 
cater to music and video fans, respectively. 

We collected the top 450 and 550 most frequently used in- 
terests of the SMM and M15M members and used the tech- 
nique described above to generate their semantic maps ^smm 
and Wmism- We then calculated the intersection between the 
NSSI semantic network and each of the other semantic net- 
works under consideration. The intersection contains the as- 
sociations that are significant for both communities and pre- 
sumably are pathology-free. 

We combine semantic network edges using fuzzy set theo- 
retical operations for intersection and difference: 

A n B = min (a, 0) (2) 
A\B = min (a, 1-/3) (3) 

Here, A and B are edges, and a and j3 are generalized simi- 
larities associated with the edges. 

Let f flibe the intersection of the original network and a 
comparison network £ £ {SMM, M15M}. Then for each 
edge ey = (Vi, Vj) in if the corresponding edge also ex- 
ists in x, then this edge is added to the intersection network 
with the weight calculated using Eq.|2] Otherwise, its weight 
is taken to be to emphasize the lack of either similarity be- 
tween the two terms: 

ffcn^j ife i7 e(*nx) 

(* n *)« = { else (4) 

In other words, if two terms are considered substantially 
similar both in ^ and in x, then their similarity is universal 
with respect to "J and x, that is, neither \&- nor 2>specific. 




The resulting intersection networks for x e {SMM. M15M} 
have fewer nodes (300 and 220) and about the same den- 
sity (9% and 12%). We analyzed them using Gephi soft- 
ware (Figure [2) and discovered that they have a remarkably 
common structure: both networks have dense and strongly 
connected CRE and DLE clusters and smaller and less con- 
nected to the "mainland" music/movies clusters. The lat- 
ter clusters, in turn, consist of easily identifiable "movies" 
(MOV) and MUS subcomponents. 

The intersection networks with their structural similarity ex- 
hibit many characteristics observed in the typical adolescent 
development Q . 

Next we adjust the original network with respect to the com- 
parison networks as a way to better distinguish the features 
of the NSSI semantic network. 

Let fy\x be the difference between the original network W 
and a comparison network x. If edge ey exists in ^ but not 
in x, then it is essential and is inserted in the difference net- 
work with its original weight. If the edge exists in both net- 
works, it is inserted with the weight calculated using Eq. [3] 
Otherwise, the edge exists only in x; it is irrelevant and is 
not inserted: 

In other words, if two terms are considered substantially 
similar in ^> but not in x, then their similarity is ^-specific 
but not x-specific. These terms may be perceived as similar 
by the NSSI users because of their NSSI pathologies. 

The adjusted NSSI interest network with respect to x = 
M15M is shown in Figure [3] The new network has much 
less dense CRE and DLE clusters; cf. the original network 
in Figure Q] The PAT and especially MUS clusters are still 
very dense. From Figure [3] we can identify two groups of 



Figure 3. Semantic network of interests in the NSSI-related communi- 
ties, adjusted for common interests; cf. FigurefJJ 

possible NSSI beacon interests: 

Non-PAT interests in PAT: angelina jolie, bdsm, beer, be- 
ing alone, bisexuality, black, boots, crying, dying, fire, 
fishnets, goth(ic), graveyards, hair dye, horror, industrial, 
lust, perfection, porn, serial killers, sex, tattoos, tears, vam- 
pires, wicca, witchcraft, etc. 

Bridge interests: anxiety, bracelets, corsets, edgar alien poe, 
emotions, (black) eyeliner, girl interrupted, girls, glitter, 
horror movies, insomnia, leather, lesbians, magick (sic), 
marylin monroe, (heavy) metal, night, poems, red, safety 
pins, screaming, spikes, techo, tori amos, etc. 

Our findings also appear indicative of the growing global 
middle-class youth culture revolving around leisure activi- 
ties (e.g., music, art) reflecting adolescent development in 
internationally-connected networks [7|. This is further sup- 
ported by the similarities between the NSSI interested com- 
munities and the non-pathological comparison communities. 
Notably both sets of communities included entertainment, 
creativity, and daily life clusters. 

CONCLUSION AND FUTURE WORK 

Exposure to NSSI via Internet use (e.g., MOSNS, YouTube) 
may facilitate the adoption and maintenance of NSSI among 
vulnerable individuals via social contagion processes ll8l IT4l . 
With this in mind, we constructed a semantic network of 
interests declared by non-suicidal self-injury (NSSI) blog- 
gers of LiveJournal. The network consists of four clearly 
separated interest clusters corresponding to the pathologi- 
cal terms (e.g. "self-injury" and "razor"), daily life, popular 
music, and creativity. The interests that bridge gaps between 
the pathology cluster and the other three clusters can be used 
as beacons signaling the potential presence of an NSSI be- 
havior. These bridge interests appear to be valuable identity 



signals [2| serving as linkages between NSSI group mem- 
bership and larger youth culture. Future research targeting 
individuals use of these bridge terms as a means to identify 
NSSI-oriented social groups would further support this inter- 
pretation and could inform prevention efforts aimed at early 
identification of vulnerable individuals at risk for NSSI. 

In related research individuals with a history of NSSI are 
found to view themselves negatively (e.g., less intelligent 
and more emotionally unstable) and as having lower social 
capital (e.g., less attractive, weak social skills J3]). The 
extent of MOSN NSSI-related communities on LiveJour- 
nal could evidence the limited opportunities for social net- 
working among people (e.g., self-harmers) who find them- 
selves excluded from their local communities/local peer net- 
works. Future work is needed examine how members of 
NSSI-related communities use MOSNs to affirm a sense of 
meaning and obtain social support and expanding social cap- 
ital. At the same time, increased time in unstructured peer 
interactions via NSSI-related MOSNs may lead to further in- 
volvement in deviant and antisocial behavior in early adult- 
hood 0. 

In this study, we considered only self-declared interests dis- 
played on the bloggers' profile pages. Some of these in- 
terests may have been chosen randomly or based on certain 
(sub)cultural considerations, and do not necessarily reflect 
the real user's attractions. As the next step in this direction, 
we plan to study keywords in the messages posted to the 
NSSI communities. We expect that the free-form language 
of the messages is a better proxy for the pathological be- 
havior. If our hypothesis is right, then the semantic network 
generated from the keywords will differ from the network 
constructed in this study. The area of overlap will 
probably be the cluster of "true" NSSI interests. 
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